aSGD: Stochastic Gradient Descent with Adaptive Batch Size for Every Parameter

نویسندگان

چکیده

In recent years, deep neural networks (DNN) have been widely used in many fields. Lots of effort has put into training due to their numerous parameters a network. Some complex optimizers with hyperparameters utilized accelerate the process network and improve its generalization ability. It often is trial-and-error tune these optimizer. this paper, we analyze different roles samples on parameter update, visually, find that sample contributes differently update. Furthermore, present variant batch stochastic gradient decedent for using ReLU as activation function hidden layers, which called adaptive descent (aSGD). Different from existing methods, it calculates size each model uses mean effective actual updates. Experimental results over MNIST show aSGD can speed up optimization DNN achieve higher accuracy without extra hyperparameters. synthetic datasets redundant nodes effectively, helpful compression.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive Variance Reducing for Stochastic Gradient Descent

Variance Reducing (VR) stochastic methods are fast-converging alternatives to the classical Stochastic Gradient Descent (SGD) for solving large-scale regularized finite sum problems, especially when a highly accurate solution is required. One critical step in VR is the function sampling. State-of-the-art VR algorithms such as SVRG and SAGA, employ either Uniform Probability (UP) or Importance P...

متن کامل

Convergence diagnostics for stochastic gradient descent with constant step size

Iterative procedures in stochastic optimization are typically comprised of a transient phase and a stationary phase. During the transient phase the procedure converges towards a region of interest, and during the stationary phase the procedure oscillates in a convergence region, commonly around a single point. In this paper, we develop a statistical diagnostic test to detect such phase transiti...

متن کامل

Cost-Sensitive Approach to Batch Size Adaptation for Gradient Descent

In this paper we propose a novel approach to automatically determine the batch size in stochastic gradient descent methods. The choice of the batch size induces a trade-off between the accuracy of the gradient estimate and the cost in terms of samples of each update. We propose to determine the batch size by optimizing the ratio between a lower bound to a linear or quadratic Taylor approximatio...

متن کامل

A stochastic gradient adaptive filter with gradient adaptive step size

This paper presents an adaptive step-size gradient adaptive filter. The step size of the adaptive filter is changed according to a gradient descent algorithm designed to reduce the squared estimation error during each iteration. An approximate analysis of the performance of the adaptive filter when its inputs are zero mean, white, and Gaussian and the set of optimal coefficients are time varyin...

متن کامل

Adaptive wavefront control with asynchronous stochastic parallel gradient descent clusters.

A scalable adaptive optics (AO) control system architecture composed of asynchronous control clusters based on the stochastic parallel gradient descent (SPGD) optimization technique is discussed. It is shown that subdivision of the control channels into asynchronous SPGD clusters improves the AO system performance by better utilizing individual and/or group characteristics of adaptive system co...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Mathematics

سال: 2022

ISSN: ['2227-7390']

DOI: https://doi.org/10.3390/math10060863